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Abstract 

Background: Tlie major medicinal alkaloids isolated from Uncaria rhynchophylla (gouteng in Chinese) capsules are 
rhynchophylline (RIN) and isorhynchophylline (IRN). Extracts containing these terpene indole alkaloids (TIAs) can 
inhibit the formation and destabilize preformed fibrils of amyloid (3 protein (a pathological marker of Alzheimer's 
disease), and have been shown to improve the cognitive function of mice with Alzheimer-like symptoms. 
The biosynthetic pathways of RIN and IRN are largely unknown. 

Results: In this study, RNA-sequencing of pooled Uncaria capsules RNA samples taken at three developmental 
stages that accumulate different amount of RIN and IRN was performed. More than 50 million high-quality reads 
from a cDNA library were generated and de novo assembled. Sequences for all of the known enzymes involved in 
TIAs synthesis were identified. Additionally, 193 cytochrome P450 (CYP450), 280 methyltransferase and 144 isomerase 
genes were identified, that are potential candidates for enzymes involved in RIN and IRN synthesis. Digital gene 
expression profile (DGE) analysis was performed on the three capsule developmental stages, and based on genes 
possessing expression profiles consistent with RIN and IRN levels; four CYP450s, three methyltransferases and three 
isomerases were identified as the candidates most likely to be involved in the later steps of RIN and IRN biosynthesis. 

Conclusion: A combination of de novo transcriptome assembly and DGE analysis was shown to be a powerful 
method for identifying genes encoding enzymes potentially involved in the biosynthesis of important secondary 
metabolites in a non-model plant. The transcriptome data from this study provides an important resource for 
understanding the formation of major bioactive constituents in the capsule extract from Uncaria, and provides 
information that may aid in metabolic engineering to increase yields of these important alkaloids. 
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Background 

Alzheimer is a common disease that threatens the health 
of the elderly [1-3]. Uncaria a member of the Rubiaceae 
family, has long been used in traditional Chinese medicine 
to treat hypertension, convulsions, tremor, and stroke [4]. 
Indole alkaloids present in aqueous solution have been 
shown to interfere with fiber formation of |3- amyloid 
protein (Ap) and destabilize preformed A|3 fibrils, a patho- 
logical marker of Alzheimer's disease [5]. An ethanol 
extract of Uncaria was found to improve the cognitive 
impairment of Alzheimer's disease caused by D-galactose 
in mice [6]. RIN and IRN two major alkaloid chemicals 
that are synthesized in Uncaria capsules, can inhibit the 
activation of microglia and reduce A|3-induced death in 
cell lines used to study neuronal differentiation (PC12) [7]. 
Also RIN reduces Ap cytotoxicity by inhibiting intracellu- 
lar calcium overloading and tau protein hyperphosphory- 
lation [8,9]. Extracts from Uncaria have the potential 
for delaying or alleviating the symptoms of Alzheimer's 
diseases that could significantly improve the quality of life 
of patients. 

RIN and IRN belong to TIAs. TIAs are a diverse class 
of natural products that comprise over 2000 members, 
many of which possess significant physiological activity. 
Most TIAs are found in three dicotyledon plant families: 
Apocynaceae, Rubiaceae and Loganiaceae, and are thought 
to be part of a plants chemical defense against pests [10]. 
TIAs have a wide variety of different molecular structures 
and biological activities [11]. Notable TIAs include vincris- 
tine and vinblastine of Catharanthus roseus that have 
anti-tumor activity. Vinblastine is clinically used for the 
treatment of leukemia, Hodgkin's lymphoma and other 
cancers [12-15]. Ajmaline of Rauwolfia serpentina can 
block the sodium channel and resist arrhythmia [15-17]. 
Whereas, Yohimbine is antagonist of a-2-adrenoceptor 
and a potential drug for the treatment of erectile dysfunc- 
tion [18,19]. Camptothecin of Ophiorrhiza pumila and 
Camptotheca acuminata is a topoisomerase enzyme 
inhibitor, and has anti-cancer effect [20,21]. Quinine of 
Cinchona species is a known anti- malarial drug [22,23]. 
TIAs production in plants is generally low and difficult to 
synthesize chemically, limiting the utility of these thera- 
peutic chemicals unless production is increased by meta- 
bolic engineering [11]. However, metabolic engineering 
requires knowledge of the TIAs biosynthesis pathway 
which is only well understood for a few, such as vindoline 
and ajmaline. 

TIAs biosynthesis involve multiple and complex metabolic 
pathways. All TIAs are derived from tryptophan and seco- 
loganin. Tryptophan decarboxylase (TDC; EC 4.1.1.28), a 
pyridoxal dependent enzyme converts tryptophan to trypta- 
mine. Isopentenyl diphosphate (IPP) the precursor for all 
terpenoids is produced by the triose phosphate/pyruvate or 
"non-mevalonate" pathway (MEP/DOXP pathway) [24,25]. 



In the first committed step of iridoid terpene biosynthesis, 
geraniol, derived from IPP, is hydroxylated by geraniol-10- 
hydroxylase (GIOH; EC 2.5.1.1) [26,27]. Oxidation of the 
iridotrial aldehyde to the carboxylic acid is followed by 
esterification and glucosylationto, yield deoxyloganin; 
subsequent hydroxylation of deoxyloganin yields loganin. 
Secologanin is then generated by oxidative cleavage of 
loganin by the enzyme secologanin synthase (SLS; EC 
1.3.3.9) [28-33]. Biosynthesis of secologanin is thought to 
be potential rate-limiting step in TIAs biosynthesis. Trypta- 
mine of tryptophan metabolic pathway via strictosidine 
synthase (STR; EC 4.3.3.2) catalysis forms strictosidine [34], 
and then enters TIAs biosynthetic pathway [35]. This part 
of the synthesis processes is the best characterized, and the 
genes encoding the enzymes are known. However each 
individual TIAs has a synthetic branch, which is started at 
strictosidine. Through four variable precursors, (4,21-Dehy 
drogeissoschizine, cathenamine, 19-epi-cathenamine and 
stemmadenine), each active substance enters the branch 
path. From structural analysis of RIN and IRN, they may be 
transformed from Stemmadenine (Figure 1). This specula- 
tion is according to the up-stream pathway of TIAs synthe- 
sis, which is the biosynthetic process of secologanin [36,37]. 
Stemmadenine synthesized RIN and IRN through the 
chemical reactions of oxidation and methyl transfer. This 
process requires three kinds of enzymes; oxidoreductases, 
methyltransferases and isomerases. The genes coding for 
oxidoreductases, methyltransferases and isomerases are 
defined as "late step genes" of RIN and IRN biosynthesis. 

Next generation sequencing technologies have proved 
to be rapid and cost-effective means to analyze the 
genome and transcriptome in non-model species. In this 
study, knowledge of the rapid accumulation periods of 
RIN and IRN in Uncaria was used to find potential can- 
didates in the synthesis of these TIAs. Using transcrip- 
tome sequencing, de novo assembly and DGE analysis, 
candidate genes encoding putative enzymes responsible 
for late steps in RIN and IRN synthesis were identified. 
Our results lay a foundation for construction of the RIN 
and IRN biosynthetic pathways that will in turn aid the 
study of the regulation and metabolic engineering of 
RIN and IRN biosynthesis. 

Results 

De novo transcriptome sequencing of capsule 

We compared RIN and IRN content throughout the 
development of Uncaria's capsule, from fallen petal (desig- 
nated capsule 1, sample point #1) to maturity (designated 
capsule 3, sample point #5) (Figure 2). RIN and IRN con- 
tent changed dramatically during the development of the 
Uncaria capsule. The two TIAs accumulate in the cap- 
sules were detected in capsule 1 (fallen petal), until a max- 
imum is reached at sample point 3 (designated capsule 2) 
20 days later, and then drops to its lowest levels at capsule 
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Figure 1 Putative riiynciiopliylline and isorliynciiopiiylline biosyntiiesis patliway in Uncaria rhynchophylla. MEP pathway: "non-mevalonate" 
pathway; TDC: tryptophan decarboxylase, EC:4.1.1.28; STR: strictosidine synthase, EC:4.3.3.2; P450: cytochrome P450, EC:1.14.--. 

V J 



maturity (designated capsule 3) after another 20 days 
(Figure 2), To obtain an overview of the Uncaria capsule 
transcriptome in the process of RIN and IRN accumulation, 
a cDNA library was generated from an equal mixture of 
RNA isolated from capsule 1 (starting point for RIN and 
IRN accumulation), capsule 2 (peak RIN and IRN content), 
and capsule 3 (RIN and IRN content is at its lowest). 
The library was sequenced using the Illumina HiSeq™ 
2000 platform. 51 million raw reads were obtained which 
had 50,159,521 clean reads (97.49% of all raw reads) (in 
Additional file 1: Table SI), which were used for all subse- 
quent analysis. 

With the increase of the sequencing length, sequencing 
error rates will rise [38,39], but the error rate does not 
exceed 1% (in Additional file 2: Figure SI). 6 bp random 
primers were used in reverse transcription. The incomplete 
binding of random primers and template could cause se- 
quencing errors of the first six nucleotides position [39,40] 
(in Additional file 2: Figure SI, S2). Clean reads were 
spliced into transcripts by trinity software (v2012- 10-05) 
[41], and 100,940 transcripts were obtained. Transcript 
length was an index which measured with splicing effect. 
To a certain extent, N50 can evaluate splicing integrity 
[42]. The average length of transcript was 1,181 bp, the 



longest transcript was 14,101 bp, the shortest transcript 
was 201 bp (N50 was 1970 bp, N90 was 503 bp). From 
these transcripts, 55,523 unigenes were identified, with an 
average length of 717 bp, the longest unigene was 
13,307 bp, the shortest unigene 201 bp (N50 was 1242 bp, 
N90 was 280 bp). The length statistics of spliced transcripts 
and unigenes were shown in Additional file 1: Table S2. 

Annotation of gene function and CDS prediction 

In order to assign accurate annotation information to 
unigenes, multiple databases were interrogated including; 
the NCBI database Non-redundant protein sequences 
(Nr), the manually annotated and curated protein sequence 
database (Swiss Prot), NCBI nucleotide sequences (Nt), 
Protein Family (Pfam), Clusters of Orthologous Groups 
of proteins (COG), Gene Ontology (GO) and Kyoto 
Encyclopedia of Genes and Genomes Ortholog database 
(KO). Annotation results of unigenes are shown in Table 1. 

GO is a classification system to describe the properties 
of the organism genes and their products, including Bio- 
logical Process, Cellular Component and Molecular 
Function [43]. Through alignment of GO databases, 
26,692 unigenes were annotated to 57 terms of GO clas- 
sification (Figure 3). Among Biological Process, 'immune 
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Figure 2 The RIN and IRN content changes of Uncaria capsule from the fallen petal to yellow maturity (A) and biomorph of the sample 
point 13,5 (B, C D). 



system process [GO:0002376]' was annotated with 637 uni- 
genes, response to stimulus [GO:0050896]' was annotated 
with 6328 unigenes, metabolic process [GO:0008152]' was 
annotated with 15636 unigenes. 

KOG is a classification system based on orthologous 
genes. Orthologous genes have the same function and 
common ancestor. 11,150 unigenes were annotated to 26 
groups by KOG database (Figure 4). (R) General Func- 
tional Prediction Only annotated 1869 unigenes at most, 
and (X) Unnamed protein annotated 2 unigenes at least. 
(G) Carbohydrate metabolism and transport was annotated 
with 607 unigenes, (V) Defense mechanisms was anno- 
tated with 113 unigenes (Figure 4). 



Table 1 Annotation result statistics between unigenes 
and databases 



Databases 


Number of Unigene 


Percent (%) 


Nr 


28,715 


51.72 


Swiss-Prot 


19,984 


35.99 


Nt 


10,863 


19.56 


Pfam 


21,083 


37.97 


COG 


11,150 


20.08 


GO 


26,692 


48.07 


KO 


12,256 


22.07 



KO is a network diagram of cell metabolic pathways, 
including metabolism, genetic information processing, envir- 
onmental information processing, cellular processes, organ- 
ismal systems and human diseases. In this study, 12,256 
unigenes were annotated by alignment with the KO data- 
base. Among them; 4,326 unigenes were annotated to 
Metabolism (35.30% of the total), 660 to Environmental 
Information Processing (5.39%), 2,060 to Genetic Informa- 
tion Processing (16.80%), 1,340 to Organismal Systems 
(10.93%), 955 to Cellular Processes (7.79%). According to 
the KO annotation, removing unigenes associated with the 
Human Diseases, the remaining 8,745 unigenes (15.75% of 
all unigenes) were classified to 31 KEGG pathways (Figure 5). 
Among them; 269 were annotated to Metabolism of terpe- 
noids and polyketides pathway, 74 unigenes involved in 
Terpenoid backbone biosynthesis [PATHWAY: ko00900], 
17 unigenes involved in Monoterpenoid biosynthesis 
[PATHWAY: ko00902], 593 unigenes were annotated to 
Amino acid metabolism pathway, 54 unigenes involved in 
Phenylalanine, tyrosine and tryptophan biosynthesis [PATH- 
WAY: ko00400], 295 unigenes were annotated to Biosyn- 
thesis of other secondary metabolites pathway, 8 unigenes 
involved in Indole alkaloid biosynthesis [PATHWAY: 
ko00901]. 

Open Reading Frame (ORE) was obtained by compar- 
ing annotation results of unigenes with the different 
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databases, the unigenes were then determined using Cod- 
ing Sequence (CDS) according to the standard codon table 
[38]. If the results from different databases were contra- 
dictory, priority was given to Nr, Swiss-prot and KEGG 
GENES respectively. The unigenes that failed to match the 
database information the predicted protein CDS was de- 
termined using estscan software [44]. The CDS of 28,611 
unigenes (51.52% of all unigenes) could be predicted using 
the annotation databases whereas 16,566 unigenes (29.84% 
of all unigenes) CDS predictions were made using estscan 
software. 

Genes involved in the biosynthesis of TIAs 

TIAs biosynthesis involves three upstream metabolic 
pathways, including MEP pathway, iridoid glycosides 
pathway and shikimate pathway [45]. Secologanin is a 
natural active substance and iridoid glycoside product of 
TIAs biosynthetic precursor, but its synthesis process is 
not yet understood. Another precursor tryptamine; is 
generated by tryptophan using the shikimate pathway by 
catalysis with TDC. TDC is communication node be- 
tween primary metabolism and secondary metabolism. 
Unigenes annotated by comparison with the Nr database 
comparison that could encode enzymes related the 
above pathways are listed in Additional file 1: Table S3. 
All putative oxidoreductases that could represent genes 
involved in the late steps in RIN and IRN synthesis are 
listed in Additional file 1: Table S4 (A-C). 

The MEP pathway is branch of Terpenoid backbone bio- 
synthesis [PATHWAY: ko00900] and the iridoid glycosides 
pathway is branch of Monoterpenoid biosynthesis [PATH- 
WAY: ko00902]. The product of phenylalanine, tyrosine 
and tryptophan biosynthesis [PATHWAY: ko00400] enters 
indole alkaloid biosynthesis [PATHWAY: ko00901] via 



TDC and STR. Putative unigenes associated with TIAs 
synthesis were annotated by KO database (in Additional 
file 1: Table S5). The Uncaria capsule transcriptome data 
contained all genes encoding every enzyme of the MEP 
pathway and shikimate pathway. However, there are defi- 
ciencies associated with the databases as the Monoterpe- 
noid biosynthesis [PATHWAY: ko00902] in KO is only 
aligned to myrcene/ocimene synthase (EC 4.2.3.15). In the 
KEGG pathway, TDC belongs to indole alkaloid biosyn- 
thesis [PATHWAY: ko00901], which is only aligned to 
STR, not aligned to the TDC. The reason may be few stud- 
ies were conducted with TIAs biosynthesis, and reference 
data was insufficient. 

Capsule DGE analysis at different periods 

DGE sequencing analysis of capsule 1, capsule 2 and 
capsule 3 was performed by Illumina HiSeq™ 2000 se- 
quencing platforms, respectively. Capsule 1 produced 
9,270,290 clean reads, capsule 2 produced 13,298,879 
clean reads, and capsule 3 produced 15,331,847 clean 
reads. The clean reads of the samples were mapped to 
the reference sequence derived from our Uncaria tran- 
scriptome data, and obtained from the annotation infor- 
mation of each sample using RSEM software [46]. Total 
mapped reads were capsule 1 8,529,453 (92%), capsule 2 
12,258,147 (92%), and capsule 3 14,047,840 (92%) with 
approximately 47,000 unigenes annotated in each sam- 
ple. To estimate the level of gene expression the reads 
count were transformed into Reads Per Kilo bases per 
Million mapped Reads (RPKM) [47]. RPKM is the num- 
ber of reads per million reads from one gene per thou- 
sand bases, considering the effect of sequence length 
and depth of RNA-seq process on the reads count. The 
size of RPKM value therefore reflects the abundance of 
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Figure 4 Histogram of unigene KOG classification. The x-axis indicates 26 groups of KOG. Tine y-axis indicates tine percentage of tine number 
of genes annotation under tlie group in tine total number of genes annotation. 



the gene expression and we defined RPKM > 0.3 as the 
threshold of significant gene expression. The density 
distribution of RPKM can test gene expression profiles 
of samples; comparison chart of gene expression profiles 
among three samples was showed in Figure 6. Among 
them, the common genes of three samples reached 
38,592, characteristic genes of capsule 1 were 1,687, 
characteristic genes of capsule 2 were 1,901 and charac- 
teristic genes of capsule 3 were 1,986. 

Analysis of differentially expressed genes 

Read count data was used to analyze the difference of 
gene expression. Since only one biological samples was 
sequenced, in order to avoid possible high false positive 
rates we first used trimmed mean of M values (TMM) 
to standardize data processing [48], then took advantage 



of DEGseq to analyze differentially expressed genes; 
the analysis conditions were qvalue <0.005 & log2 (fold- 
change) >1 [49-51]. There were 1,488 differentially 
expressed genes identified between capsule 2 and 
capsule 1, with 803 genes were up-regulated and 685 
genes were down-regulated; 1,343 differentially expressed 
genes between capsule 3 and capsule 1, 749 genes were 
up-regulated and 549 genes were down- regulated; 1,779 
differentially expressed genes in capsule 3 vs capsule 2, with 
865 genes were up-regulated and 914 genes were down- 
regulated. Comparison between differentially expressed 
genes between the three samples is shown in Figure 7. 
Capsule 3 vs capsule 2 comparisons had the most spe- 
cific differentially expressed genes. Interestingly, capsule 
3 vs capsule 2 and capsule 2 vs capsule 1 had 748 com- 
mon differentially expressed genes, capsule 3 vs capsule 
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Figure 5 Functional classification and pathway assignment of unigenes by KEGG. The results are summarized in five main categories: 






A, Cellular Processes; B, Environmental Information Processing; C, Genetic Information Processing; D, Metabolism; E, Organismal Systems. The 






y-axis indicates the name of the KEGG metabolic pathways. The x-axis indicates the percentage of the number of genes annotation under the 




pathway in the total number of genes annotation. 









2 and capsule 3 vs capsule 1 had 744 common differen- 
tially expressed genes, the two were very close. 

Differentially expressed genes of from the DGE ana- 
lysis were further analyzed using GO and KEGG enrich- 
ment in order to determine their potential function and 
within a metabolic pathway. GO enrichment classifica- 
tion using GOseq [52]. GO classification of differentially 
expressed genes between the different capsule stages is 
showed in Additional file 2: Figure S3 (A-C). 419 differ- 
entially expressed genes of capsule 2 vs capsule 1 were 
classified into the biological process, and 133 genes were 
identified as involved in oxidation-reduction process 
[GO: 0055114]. 703 genes were classified into cellular 
component. 903 genes were classified into molecular 
function, 903 genes involved in catalytic activity 
[GO:0003824], 137 genes involved in oxidoreductase 
activity [GO: 0016491]. 588 genes among differentially 
expressed genes of capsule 3 vs capsule 1 were classified 
into the biological process, 226 genes involved in 
oxidation-reduction process [GO: 0055114]. 507 genes 
were classified into cellular component, 765 genes were 



classified into molecular function, 240 genes participated 
in oxidoreductase activity [GO: 0016491]. 657 genes among 
differentially expressed genes of capsule 3 vs capsule 2 
were classified into the biological process, 255 genes 
involved in oxidation-reduction process [GO: 0055114]. 
1301 genes were classified into cellular component, 841 
genes were classified into molecular function, 272 genes 
participated in oxidoreductase activity [GO: 0016491]. 

KEGG enrichment classification (hypergeometric test) 
was used to identify significant enrichment pathways of 
differentially expressed genes between the different stages of 
Uncaria capsule development using a False Discovery Rate 
(FDR) <0.05. Flavonoid biosynthesis [PATHWAY: ko00941], 
phenylpropanoid biosynthesis [PATHWAY: ko00940], 
phenylalanine metabolism [PATHWAY: ko00360] and bio- 
synthesis of secondary metabolites [PATHWAY: koOlllO] 
were significantly enriched pathways among the differen- 
tially expressed genes between the capsule stages. Indole 
alkaloid biosynthesis [PATHWAY:ko00901] was enriched 
in capsule 3 vs capsule 2, but did not reach significant level 
(FDR <0.05) [in Additional file 2: Figure S4 (A-C)]. 
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Figure 7 Venn diagram of differentially expressed genes. The 

sum of the numbers in each large circle represents total number of 
differentially expressed genes between combinations, the overlap 
part of the circles represents common differentially expressed genes 
between combinations. 



Analysis of genes with homology to known genes 
involved in TIA biosynthesis 

There were 68 unigenes that may be involved in the TIA 
biosynthesis due to their homology to genes known to 
be involved in TIA from other species by NCBI Nr 
comparisons [53]. To narrow down this list we looked 
for genes that showed expression profiles consistent with 
changes in RIN and IRN content during capsule devel- 
opment. As previously mentioned RIN and IRN content 
is strongly elevated between capsule 1 and capsule 2, 
then decreased significantly between capsule 2 and 
capsule 3. RIN and IRN content is also lower in capsule 
3 than that present in capsule 1. 17 of the putative TIAs 
biosynthesis genes showed significantly up or down- 
regulated expression (in Additional file 1: Table S6). 
However, only ten genes had expression profiles either 
similar or inverse to RIN and IRN accumulation in at 
least 2 stages. Anthranilate synthase (AS; EC 4.1.3.27) 
(compl2708_c0) and loganin acid methyltransf erase 
(LAMT; EC 2.1.1.50) (comp33316_c0) expression pro- 
files matched of the direction of RIN and IRN content 
change, whereas SLS (comp35216_c0) had the inverse 
expression profiles. The expression profiles of TDC 
(comp34949_c0), GIOH (comp33830_c0) and STR 
(compl6908_c0 and comp38465_cl) had two phases 
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consistent with RIN and IRN changes. A total of 64 
unigenes of KO annotation involved in the synthesis of TIAs 
were based on enzymes of KEGG pathway. DGE analysis 
found that 11 unigenes showed significantly up/down-regu- 
lated expression changes (in Additional file 1: Table S7). AS 
(compl2708_c0) expression changes with direction of RIN 
and IRN content change was consistent. The up/down regu- 
lated expression changes of 3-dehydroquinate dehydratase/ 
shikimate dehydrogenase (EC 4.2.1.10) (comp22817_c0) and 
STR (compl6908_c0) with direction of RIN and IRN 
content change, two phases were consistent. 

Analysis of putative genes involved in the late steps of 
RIN and IRN biosynthesis 

Enzymes classes putatively involved of biosynthesis of 
RIN and IRN include oxidoreductase, transferase, and 
isomerase. CYP450 is one of the oldest protein families, 
has catalytic oxidation function of carbon-carbon bond, 
alkyl hydroxylation and hydroxyl oxidation, and plays an 
important role in plant secondary metabolites synthesis 
process [54-57]. Oxidoreductases involved in RIN and 
IRN synthesis are therefore likely to come from the 
CYP450 family. There are 30 varieties of methyltransfer- 
ase but only o-methyl transferase is likely to be involved 
in RIN and IRN [58,59]. Isomerase are relatively un- 
known, however isomers conversion between RIN and 
IRN could be realized through enzyme catalysis, as nat- 
ural transformation of the chemical equilibrium is likely 
to be small. 

From the linear ia transcriptome data, a total of 193 
CYP450, 280 methyltransferase and 144 isomerase uni- 
genes were identified. DGE analysis showed 29 CYP450s 
were significantly up/down-regulated during capsule 
development (in Additional file 1: Table S8A). Only the 
expression of 1 unigene (comp28593_c0) matched changes 
in RIN and IRN content. The expression profiles of 9 
unigenes (comp21921_c0, comp26673_c0, comp30485_c0, 
comp31075_c0, comp32966_c0, comp34791_cl, comp35950_c0, 
comp37580_c0 and comp38378_c0) had similar to RIN 
and IRN content in two stages. 21 methyltransferase uni- 
genes possessed significantly differential expression in the 
process of capsule development (in Additional file 1: Table 
S8 B). 1 unigene (comp33316_c0) expression profile was 
consistent to RIN and IRN content. Whereas, the expres- 
sion profile of 6 unigenes (compl2388_c0, comp29880_c0, 
comp34948_c0, comp36199_c0, comp36942_c0, and 
comp41847_c0) where similar to RIN and IRN content 
in two stages. 12 unigenes putatively encoding isomer- 
ases had significantly altered expression in the process 
of capsule RIN and IRN content change (in Additional 
file 1: Table S8C), with 3 unigenes (comp22426_c0, 
comp28178_c0 and comp38594_c0) showing changes 
consistent with the TIAs RIN and IRN content in two 
stages. 



STR is a starting point and key enzyme of TIAs 
synthesis involved in the early part of TIAs synthesis. 
STR expression will affect the metabolic activities of the 
downstream branch. The gene expression profiles of 
TIAs downstream branch are expected to be similar to 
the expression profiles of STR, The gene expression 
profiles not only include up/down-regulated expression, 
but also include gene expression variation and amplitude 
from each stage of capsule 1 capsule 2 capsule 3. 
Expression profiles of CYP450, methyltransferase and 
isomerase hierarchically clustered with that of STR 
[STR^^ compl6908_c0 and comp38465_cl; STRi^o^ 
compl6908_c0 and comp36585_cO) (shown in Figures 8 
and 9). Ten candidate genes were found to cluster 
closely with STR (correlation >0.99). The candidate 
genes of CYP450 had 4 unigenes (comp31075_c0, 
comp28593_c0, comp34791_cl and comp26673_c0); 
candidate genes of methyltransferase had 3 unigenes 
(comp31161_c0, comp33316_c0 and comp34948_c0) 
and candidate genes of isomerase had 3 unigenes 
(comp22426_c0, comp38594_c0 and comp28178_c0). 
Through the alignment of transcriptome database, the 
candidate unigene sequences and amino acid sequences 
of "late step genes" of RIN and IRN biosynthesis were 
obtained and showed in Additional file 1: Table S9. 

Quantitative PGR analysis of candidate TIA biosynthesis 
genes 

Quantitative PGR analysis was performed on 13 selected 
genes from CYP450, methyltransferase, isomerases and 
STR functional categories putatively involved in TIA 
biosynthesis of Uncaria, The relative expression changes 
of the candidate genes are shown in Figure 10. Generally 
during the developmental process of capsule 1 to capsule 
2, and capsule 2 to capsule 3 growth, the expression pro- 
files of the genes assayed show highest gene expression 
at the capsule 2 stage, and lowest gene expression at 
capsule 3, mirroring the measured RIN and IRN content 
profiles, and confirming the DGE expression data. The 
STR genes (compl6908_c0, and comp38465_cl) how- 
ever showed only marginal increases at capsule 2 stage. 
The relative expression of one of the CYP450 genes 
(comp26673_c0) was inverse to the expected expression, 
with lowest expression at capsule 2 and then increasing 
at capsule 3 (Figure 10 and Additional file 1: Table S8). 

Discussion 

The medicinal parts of Uncaria are its "stem and stem 
hook" and the effective pharmacological constituents are 
RIN and IRN. In Uncaria plants, if a branch has a 
capsule, then there will be no stem hook. A branch with 
a stem hook growing on it, will not have flowers nor 
capsules next to it. In such case, there are two kinds of 
leaves and stems, one is the twig containing the capsule 
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Figure 8 Expression profiles clustering of CYP450, methyltransferase and isomerase genes from A to C respectively at three different 
capsule developmental stages. Hierarchical clustering of expression data for 29 candidate CYP450 genes, 21 candidate methyltransferase genes 
and 12 candidate isomerase genes respectively using STRNr as reference profiles. Expression ratios are expressed as Log 10 values. 
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Figure 9 Expression profiles clustering of CYP450, methyltransferase and isomerase genes from A to C respectively at three different 
capsule developmental stages. Hierarchical clustering of expression data for 29 candidate CYP450 genes, 21 candidate methyltransferase genes 
and 12 candidate isomerase genes respectively using STRko as reference profiles. Expression ratios are expressed as Log 10 values. 
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Figure 10 Quantitative PCR analysis of the candidate downstream genes. A STR; B CYP450; C Methyltransferases; D Isomerases. 



and the other is the twig containing a stem hook. "Stem 
and stem hook", leaf, stem and capsule of linear ia con- 
tain RIN and IRN, and different parts contain different 
content (data not shown). However, the difference be- 
tween the RIN and IRN content of the "stem and stem 
hook" and that of stem on the branch with capsule is 
not significant. In addition, the serious lignification of 
the stem influences the integrity of RNA. By contrast, 
the difference between the RIN and IRN content of leaf 
on the branch with capsule and that of the leaf on the 
branch with stem hook is significant. No proteins related 
to the biosynthesis of the two TIAs have been found 
through dimensional electrophoresis analysis (data not 
shown). Among all the tissues and organs, the capsule 
has the highest RIN and IRN content and during the 
developmental process there is an increase and then 
decrease in the accumulation of the RIN and IRN. These 
changes in RIN and IRN content in the capsule reflect 
different rates of synthesis and provides a means for 
identifying candidate genes involved in the biosynthesis 
of RIN and IRN through analyzing the gene expression 
profiles [60]. 

Next generation sequencing technology (NGS) offers a 
shortcut for research on functional genomics, compara- 
tive genomics [61-63] and genetic analysis [64] of non- 
model plants. With the progress of NGS, and the 
increase of sequencing throughput, improvements in 
accuracy, and decreased costs, enable this technology to 



be broadly applied to various research areas [65,66]. 
There is little to no genome information of Uncaria or 
the related species. Therefore sequencing RNA using 
NGS from a specific period of plant development or 
treatment provides the information necessary for de 
novo assembly of the transcriptome; generating a refer- 
ence transcriptome sequence for non model plants that 
can provide the basis for finding genes associated with 
particular important functions [67-69]. DGE can quickly 
and thoroughly analyze the gene expression under a 
variety of tissues and conditions [70]. DGE however 
requires a reference sequence to align the relative small 
read lengths. Combining DGE with an RNA-sequencing 
generated reference transcriptome provides an excellent 
combination for analysis of non model plant species. 
Confirmation of the expression profiles of a subset of 
genes identified DGE using qPCR indicates the accuracy 
of this method for quickly identifying candidates based 
on expression profiles. 

Predicting the biosynthesis pathway of RIN and IRN 
lays the theoretical foundation for the regulation of the 
RIN and IRN accumulation as well as their metabolic 
engineering research [71]. The formation of TIAs 
precursor involves the shikimate pathway of tryptophan 
metabolism and MEP pathway which are the common 
pathways of terpene biosynthesis. During the comparison 
and annotation of TIA like genes, we found all the encod- 
ing genes of the related enzymes in the two pathways in 
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our cDNA library. Moreover, the encoding genes of STR 
which include the CYP450, methyltransferase and isomer- 
ase of later biosynthetic step in RIN and IRN biosynthesis 
were also found. By regulating the metabolic pathway, the 
accumulation of the RIN and IRN in the plant can be 
improved [72,73]. This can be started from many aspects 
such as improving the TIAs biosynthesis pathway activity 
and upstream shunt volume of carbon skeleton. Improve- 
ment of biosynthesis pathway activity can be fulfilled 
through improving the activity of the key enzyme in the 
pathway [74,75]. The increase of carbon shunt volume 
needs to shunt more carbon from the glycometabolism to 
enter the target pathway. Additionally, weaken or cutting 
off the carbon channel of non target pathways etc. The 
MEP pathway exists in the plastid, the amount of which 
can be increased by employing the matrilineal inheritance 
feature of plastid. In this way, more carbon in the 
glycometabolism will enter the RIN and IRN biosynthesis 
pathway. Alternatively, increase/enhance the transporter 
quantity/transport ability on the plastid membrane in- 
volved in the intermediate products of the MEP pathway 
of RIN and IRN biosynthesis through transformation of 
the organelle. Ultimately the substrate amount of iridoid 
pathway is increased. The auto-regulation mechanism of 
plant biosynthetic pathways is complex and strict. Con- 
struction of the vitro RIN and IRN biosynthesis pathway 
using information from the related research on vinblastine 
and vincristine [76-79] may provide a means of accelerat- 
ing our knowledge of the pathways. The low yield of 
natural active substances is one of the factors limiting the 
development and utilization of Chinese medicine re- 
sources. The use of engineering techniques to improve 
production of natural active substance is an effective way 
to solve the contradiction between supply and demand. 
This study presented the pathway of RIN and IRN biosyn- 
thesis, and laid the theoretical foundation for the medi- 
cinal potential development of RIN and IRN. If the yield 
issues of RIN and IRN are effectively solved then they will 
have a wider range of applications in the treatment of 
Alzheimer's disease. 



Conclusion 

A combination of De novo transcriptome sequencing 
and DGE analysis based on the next generation sequen- 
cing technology was shown to be a powerful method for 
identifying candidate genes encoding enzymes respon- 
sible for the biosynthesis of novel secondary metabolites 
in a non-model plant. Through this method, biosynthetic 
encoding genes of Uncaria TIAs biological precursor 
were found. CYP450, methyltransferase and isomerase 
were selected as potential candidates involved in late 
biosynthesis of RIN and IRN. The transcriptome data 
from this study provides an important resource for 



understanding the formation of major bioactive constitu- 
ents in the capsule extract from Uncaria, 

Methods 

Sample collection and preparation 

Uncaria was collected from Nanning, Guangxi, China, 
and identified as Uncaria rhynchophylla by Ma Xiaojun. 
From blossom to ripened capsules, the capsules were 
collected once every ten days and six times totally. The 
collected capsules were divided into two parts. One part 
was frozen in liquid nitrogen for isolation of RNA, and 
stored at -80°C until use. The other part was used for 
the determination of RIN and IRN content by high per- 
formance liquid chromatography after drying and crush- 
ing. The plant samples were stored in Plant Physiology 
Laboratory of Northeast Agricultural University. 

The analysis results from high performance liquid 
chromatography showed a single peak curve of RIN and 
IRN content changes (Figure 2A) [80]. Sampling point 3 
had the highest RIN and IRN content and was desig- 
nated capsule 2. Sampling point 1 (designated capsule 1) 
and 5 (designated capsule 3) had the low RIN and IRN 
content. RNA samples from sample point 1 (capsule 1), 
3 (capsule 2) and 5 (capsule 3) were used for high- 
throughput sequencing. 

2 (iL from each of three samples were mixed for 
transcriptome sequencing. 4 (iL from each of three 
samples were used for expression profile sequencing. 
The RNA samples from sample point 1, 3 and 5 were 
named capsule 1, capsule 2 and capsule 3 in expression 
profile sequencing. Representative images of sample 
point 1, 3 and 5 are shown in Figure 2B,C,D. 

RNA isolation, quantification and qualification 

Total RNA of capsule 1, capsule 2 and capsule 3 were 
isolated using the improved CTAB method [81]. RNA 
degradation and contamination was monitored on 1% 
agarose gels. RNA purity was checked using the Nano- 
Photometer^spectrophotometer (IMPLEN, CA, USA). 
RNA concentration was measured using QubifRNA 
Assay Kit in Qubit®2.0 Flurometer (Life Technologies, 
CA, USA). RNA integrity was assessed using the RNA 
Nano 6000 Assay Kit of the Bioanalyzer 2100 system 
(Agilent Technologies, CA, USA). 

Sample preparation for sequencing 

A total amount of 3 (ig RNA per sample were used as 
input material for the RNA sample preparations. All four 
samples had RNA integrity number values above 8.0. 
Sequencing libraries were generated using Illumina Tru- 
Seq™RNA Sample Preparation Kit (Illumia, San Diego, 
USA) following manufacturers recommendations and 
four index codes were added to attribute sequences to 
each sample. Briefly, mRNA was purified from total 
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RNA using poly-T oligo- attached magnetic beads. Frag- 
mentation was carried out using divalent cations under 
elevated temperature in Illumina proprietary fragmenta- 
tion buffer. First strand cDNA was synthesized using 
random oligonucleotides and Superscript II. Second 
strand cDNA synthesis was subsequently performed 
using DNA Polymerase I and RNase H. Remaining over- 
hangs were converted into blunt ends via exonuclease/ 
polymerase activities and enzymes were removed. After 
adenylation of 3' ends of DNA fragments, Illumina PE 
adapter oligonucleotides were ligated to prepare for 
hybridization. In order to select cDNA fragments of 
preferentially 200 bp in length the library fragments 
were purified with AMPure XP system (Beckman 
Coulter, Beverly, USA). DNA fragments with ligated 
adaptor molecules on both ends were selectively enriched 
using Illumina PCR Primer Cocktail in a 10 cycle PCR 
reaction. Products were purified (AMPure XP system) 
and quantified using the Agilent high sensitivity DNA 
assay on the Agilent Bioanalyzer 2100 system. 

Clustering and sequencing 

The clustering of the index-coded samples was per- 
formed on a cBot Cluster Generation System using Tru- 
Seq PE Cluster Kit v3-cBot-HS (Illumia) according to 
the manufacturer s instructions. After cluster generation, 
the library preparations were sequenced on an Illumina 
Hiseq 2000 platform. 

Quality control 

Raw data (raw reads) of fastq format were firstly 
processed through our self-written perl scripts. In this 
step, clean data (clean reads) were obtained by removing 
reads containing adapter, reads containing ploy-N and 
low quality reads from raw data. At the same time, Q20, 
Q30, GC-content and sequence duplication level of the 
clean data were calculated. All the downstream analyses 
were based on clean data with high quality. 

Transcriptome assembly 

Pair-end sequencing was performed. The left files (read 
1 file) from all libraries/samples were pooled into one 
big left.fq file, and right files (read 2 file) into one big 
right.fq file. Transcriptome assembly was accomplished 
based on the left.fq and right.fq using Trinity [41] with 
min_kmer_cov set to 2 and all other parameters set 
default. 

Gene functional annotation 

Gene function was annotated based on the following 
databases: Nr (NCBI non-redundant protein sequences); 
Nt (NCBI non-redundant nucleotide sequences); Pfam 
(Protein family); KOG/COG (Clusters of Orthologous 
Groups of proteins); Swiss-Prot (A manually annotated 



and reviewed protein sequence database); KO (KEGG 
Ortholog database); GO (Gene Ontology). 

Reads mapping to the reference genome 

Sequencing-received raw image data was transformed by 
base calling into sequence data. Prior to mapping reads 
to the reference database, we filtered all sequences to 
remove low quality sequences. A preprocessed database 
of nucleotide tag was created using our transcriptome 
reference database. The clean reads of the samples were 
mapped to the reference sequence of Uncaria transcrip- 
tome data, and obtained from the annotation informa- 
tion of each sample using RSEM software. 

Quantification of gene expression level 

HTSeq vO.5.3 was used to count the reads numbers 
mapped to each gene. And then RPKM of each gene was 
calculated based on the length of the gene and reads 
count mapped to this gene. RPKM, Reads Per Kilobase 
of exon model per Million mapped reads, considers the 
effect of sequencing depth and gene length for the reads 
count at the same time, and is currently the most 
commonly used method for estimating gene expression 
levels [47]. 

Differential expression analysis 

Prior to differential gene expression analysis, for each se- 
quenced library, the read counts were adjusted by edgeR 
program package through one scaling normalized factor. 
Differential expression analysis of two conditions was 
performed using the DEGseq R package (1.12.0). The P 
values were adjusted using the Benjamini & Hochberg 
method [51]. Corrected P-value of 0.005 and log2 (Fold 
change) of 1 were set as the threshold for significantly 
differential expression. 

GO and KEGG enrichment analysis of differentially 
expressed genes 

Gene Ontology (GO) enrichment analysis of differen- 
tially expressed genes was implemented by the GOseq 
R package, in which gene length bias was corrected. 
GO terms with corrected P-value less than 0.05 were 
considered significantly enriched by differential expressed 
genes. 

KEGG is a database resource for understanding high- 
level functions and utilities of the biological system, 
such as the cell, the organism and the ecosystem, 
from molecular-level information, especially large-scale 
molecular datasets generated by genome sequencing 
and other high-throughput experimental technologies 
(http://www.genome.jp/kegg/). We used KOBAS soft- 
ware to test the statistical enrichment of differential 
expression genes in KEGG pathways. 
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Quantitative PCR 

RNA for quantitative PCR analysis was the same 
samples as used for DGE. Primers were designed using 
Primer 5.0, and shown in Additional file 1: Table SIO. 
Reverse transcription used PrimerScoript RT reagent Kit 
With gDNA eraser (perfect Real Time) (TaKaRa, RR047A) 
and Random 6 mers primer obtained cDNA. The standard 
curve was constructed using cDNA generated 1:5 with 
H2O. The cDNA was diluted in the 1:20 ratio as the 
sample reaction solution. 

Quantitative PCR reactions used ABI7500 quantitative 
PCR, and the relative standard curve method was 
adopted to analyze the relative expression of genes. This 
study used SYBR" Premix Ex TaqTM II (TaKaRa, 
DRR820A) kit, which reaction system was 25 (iL, final 
concentration of primers was 0.2 (imol-L"^, template 
was 1.0 (iL, ROX Reference Dye II was 0.5 (iL, H2O was 
used for complement system, and set up six repeats. 
Negative control used 1.0 [xL water instead of the 
template. Reaction conditions were 95°C for 5 s in order 
to activate enzyme reaction. Two step cycles were then 
used; 95°C for 5 s, then 60°C for 30 s, 40 cycles; solubil- 
ity curve conditions were 95°C for 15 s, 60°C for 1 min, 
95°C for 30 s, 60°C for 15 s. 

Availability of supporting data 

The data sets supporting the results of this article are 
available in the BioProject (BioProject:PRJNA222278, 
PRJNA222280, PRJNA222281, PRJNA222282) repository 
of the National Center for Biotechnology Information: 
http://www.ncbi.nlm.nih.gov/bioproject/?term=222278, 
http://www.ncbi.nlm.nih.gov/bioproject/?term=222280, 
http://www.ncbi.nlm.nih.gov/bioproject/?term=222281, 
http://www.ncbi.nlm.nih.gov/bioproject/?term=222282. 
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unigenes. Table S3. Coding genes and CDS involved in TIAs biosynthesis by 
Nr database annotation. Table S4. (A) Coding genes of cytochrome P450 
protein. (B) Coding genes of methyltransferase. (C) Coding genes of 
isomerase. Table S5. Encoding genes involved in TIAs biosynthesis by 
KO database annotation. Table S6. RPKM and up/down-regulated 
expression of encoding genes involved in TIAs biosynthesis by Nr 
database annotation. Table S7. Up/down regulated expression of 
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Table S8. (A) Differential expression analysis of cytochrome P450 genes. 
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Additional file 2: Figure SI. Error rate distribution of Capsule 
transcriptome sequencing. The x-axis indicates the nucleotide position of 
reads. The y-axis indicates the single-base error rate. 1-100 bp is a first 
terminal sequencing error rate distribution for the two-terminal sequencing 
reads, 100-200 bp is the other terminal sequencing error rate distribution. 
Figure S2. G/C content distribution. 0-100 bp is the GC content distribution 



of the first reads sequencing in paired-end sequencing, 100-200 bp is 
the GC content distribution of the other reads sequencing. 
Figure S3. (A) GO classification of differentially expressed genes in 
capsule 2 vs capsule 1 . (B) GO classification of differentially expressed 
genes in capsule 3 vs capsule 1. (C) GO classification of differentially 
expressed genes in capsule 3 vs capsule 2. Figure S4. (A) KEGG pathway 
enrichment scatter diagram in capsule 2 vs capsule 1. (B) KEGG pathway 
enrichment scatter diagram in capsule 3 vs capsule 1. (C) KEGG pathway 
enrichment scatter diagram in capsule 3 vs capsule 2. 
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